Enrichment of regulatory signals in conserved non-coding genomic sequence
نویسندگان
چکیده
MOTIVATION Whole genome shotgun sequencing strategies generate sequence data prior to the application of assembly methodologies that result in contiguous sequence. Sequence reads can be employed to indicate regions of conservation between closely related species for which only one genome has been assembled. Consequently, by using pairwise sequence alignments methods it is possible to identify novel, non-repetitive, conserved segments in non-coding sequence that exist between the assembled human genome and mouse whole genome shotgun sequencing fragments. Conserved non-coding regions identify potentially functional DNA that could be involved in transcriptional regulation. RESULTS Local sequence alignment methods were applied employing mouse fragments and the assembled human genome. In addition, transcription factor binding sites were detected by aligning their corresponding positional weight matrices to the sequence regions. These methods were applied to a set of transcripts corresponding to 502 genes associated with a variety of different human diseases taken from the Online Mendelian Inheritance in Man database. Using statistical arguments we have shown that conserved non-coding segments contain an enrichment of transcription factor binding sites when compared to the sequence background in which the conserved segments are located. This enrichment of binding sites was not observed in coding sequence. Conserved non-coding segments are not extensively repeated in the genome and therefore their identification provides a rapid means of finding genes with related conserved regions, and consequently potentially related regulatory mechanism. Conserved segments in upstream regions are found to contain binding sites that are co-localized in a manner consistent with experimentally known transcription factor pairwise co-occurrences and afford the identification of novel co-occurring Transcription Factor (TF) pairs. This study provides a methodology and more evidence to suggest that conserved non-coding regions are biologically significant since they contain a statistical enrichment of regulatory signals and pairs of signals that enable the construction of regulatory models for human genes. CONTACT [email protected].
منابع مشابه
Enrichment of transcriptional regulatory sites in non-coding genomic region
MOTIVATION Over-represented k-mers in non-coding genomic regions often lead to identification of potential transcriptional regulatory sites (TRS). This phenomenon has been employed by many algorithms to predict TRS in silico. Yet, the improvement of these algorithms should be based on deeper understanding of the enrichment feature. To obtain a general distributional profile of TRS in different ...
متن کاملThe short and the long of UTRs
One of the revelations of the post-genomics era has been that that much more of the genome is transcribed than was previously imagined, and that ncRNAs rival protein coding transcripts in genomic abundance. A pre-mRNA splices out much non-coding – though not necessarily nonregulatory – RNA sequence, yet the mature messenger RNA often still retains a significant non-protein-coding RNA sequence t...
متن کاملSmall fitness effect of mutations in highly conserved non-coding regions.
Comparison of human and mouse genomes has revealed that many non-coding regions have levels of sequence conservation similar to protein-coding genes. These regions have attracted a lot of attention as potentially functional genomic sequences. However, little is known about the effect mutations in these conserved non-coding regions have on fitness and how many of them are present in the human ge...
متن کاملMolecular Study of Vascular Endothelial Growth Factor Gene in Iranian Patients after Myocardial Infarction
Background: Stimulation of collateral artery growth (arteriogenesis) and/or capillary network growth (angiogenesis) would be beneficial to the patients with myocardial infarction. To understand the central role of vascular endothelial growth factor (VEGF) in biological angiogenesis, we performed molecular analysis of the VEGF gene in patients afflicted with acute myocardial infarction (AMI). Me...
متن کاملPhylogenetic Analysis of Three Long Non-coding RNA Genes: AK082072, AK043754 and AK082467
Now, it is clear that protein is just one of the most functional products produced by the eukaryotic genome. Indeed, a major part of the human genome is transcribed to non-coding sequences than to the coding sequence of the protein. In this study, we selected three long non-coding RNAs namely AK082072, AK043754 and AK082467 which show brain expression and local region conservation among vertebr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 17 10 شماره
صفحات -
تاریخ انتشار 2001